20 research outputs found

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Implementación de una metodología no invasiva in situ para la evaluación de las caraterísticas de hidratación del lápiz labial

    No full text
    El presente trabajo es un estudio experimental en el cual se tomaron como unidades de análisis a 101 voluntarias pertenecientes a la base de datos del laboratorio de demostración de eficacia y Análisis sensorial de Ebel Techriologícal Institute, las cuales cumplieron con los criterios de inclusión que se pre-determinaron. La muestra estuvo constituida por 4 lápices labiales con principios activos diferentes a los que se les determinó su eficacia cosmética, por la medida de los indicadores biológicos: hidratación, pérdida de agua trans epidermal y pH, medidos en el labio inferior de las voluntarias. Se trabajó con 5 grupos de voluntarias: grupo experimental A, B, Cl D quienes recibieron la aplicación de los diferentes tipos de labiales y un grupo control A todos los grupos se les realizaran mediciones, a tiempos de control basal, 30 minutos (inmediato), 2 y 4 semana. el efecto hidratante con los labiales A, B, C, y 1) para el control de 30 minutos fue altamente significativo; el labial CC proporcionó el mismo nivel de hidratación a los labios hasta el final de! tratamiento. La disminución de la pérdida de agua trans epidermal de los labios, con lías labiales A. B, C, y D fue altamente significativa hasta el final del tratamiento. La medición del. P1111 de los labios no varió significativamente hasta el final del tratamiento. El nivel de hidratación del labial C. fuel diferente y mayor que los demás labiales. El labial C presentó una mejor funcionalidad cosmética basado en! ;a medición de sus atributos por los indicadores biológicos. El principio activo lipexel es el responsable de la eficacia de los atributos cosméticos atribuidosThe present work is an experimental study witch took as analysis units 101 voluntars registered in the database of the laboratory of demonstration of effectiveness and sensorial analysis of ebel technological institute, which fulfilled the inclusion approaches that were pre determined. The samples were effectiveness determined , by the measure of the biological indicators: hydrate , loss of water trans- epidermal and ph, measured in the inferior lip of the voluntars. Five group of voluntars were used A,,B,C,D who received the application of the different types of lipstick and a control group. All groups were mensurations, at times of basal control 30 minutes (immediate), 2º and 4º week. Moisturizing effect with the lipsticks a,b,c and d for the 30 minutes control wasn’t highly significant , the lipstick c keep the same hydratation level to the lips until the end of the treatment. The decrease of the loss of water trans_ epidermal of the lipsticks a,b,c and d showed a low level of water tras_ epidermal at final of the tratment. The mensuration of the ph of the lips didn’t vary significantly, until the end of the treatment The level of hydratation of the lipstick, c was different and higher that the others The lipstick c presented a better cosmetic functionality based on the mensuration of its attributes by the biological indicators .the active principle lipexel is the responsible for the effectiveness of the cosmetic attributesTesi

    Usando uma abordagem filogenômica para o estudo dos protozoários

    No full text
    Submitted by Tatiana Silva ([email protected]) on 2012-12-27T17:46:12Z No. of bitstreams: 1 kary_a_c_s_ocana_ioc_bcm_0010_2010.pdf: 8676624 bytes, checksum: 38b44fc6cba3d4b68c651cbcf281a044 (MD5)Made available in DSpace on 2012-12-27T17:46:12Z (GMT). No. of bitstreams: 1 kary_a_c_s_ocana_ioc_bcm_0010_2010.pdf: 8676624 bytes, checksum: 38b44fc6cba3d4b68c651cbcf281a044 (MD5) Previous issue date: 2010Fundação Oswaldo Cruz.Instituto Oswaldo Cruz. Rio de janeiro, RJ, BrasilA reconstrução da história evolutiva, assim como o estabelecimento de hipóteses que demonstrem as relações filogenéticas dos protozoários bem como dos genes codificados pelos Elementos Genéticos Móveis (EGM) requerem o uso de várias abordagens e ferramentas, as quais não se encontram disponíveis de maneira integrada nem de maneira amigável. Diferentes abordagens filogenéticas, filogenômicas e evolutivas são necessárias para a inferência da filogenia de espécies e o estudo de genes pouco conservados como a transcriptase reversa, o gene mais representativo da classe I dos EGM, os retrotransposons. Os principais algoritmos filogenéticos e os programas que os executam têm sido unificados num único sistema: ARPA, escrito na linguagem de programação PYTHON. O sistema ARPA e a interface webestão hospedados na FIOCRUZ e estão disponíveis no endereço http://arpa.biowebdb.org. Eles estão sendo integrados ao sistema de banco de dados ProtozoaDB (http://protozoadb.biowebdb.org) e ao sistema de anotação semi-automática Stingray (http://stingray.biowebdb.org/). Uma abordagem baseada nos fundamentos da filogenômica eevolução foi utilizada para desenvolver cinco objetivos: (i) analisar e inferir a filogeniados genes relacionados à resistência de drogas em protozoários, (ii) reconstruir a árvore de espéciesde protozoários, (iii) realizar estudos de filogenômica dos EGM em protozoários, (iv) inferir a filogenia da telomerase e dos elementos de retrotransposição em Tri-tryps e (v) adaptar e ampliar o esquema Phylo ao banco de dados GUS para o armazenamento da informação filogenética. Os principais resultados obtidos para cada objetivosão: (i) As inferências filogenéticas dos genes AQP, hsp70, GP63, TRYR e MRPA relacionados à resistência a drogas em protozoários demonstrou a viabilidade das execuçõesdo sistema ARPA; (ii) a árvore de espécies de protozoários usando a abordagem da supermatriz provou ser confiável, e o teste PTP e a estatística G1 demonstraram que os dados moleculares deste estudo possuem sinal filogenético; (iii) o RAXML foi o programa mais consistente ao lidar com os diferentes níveis de polimorfismos destes genes, a detecção in silicoda seleção positiva destes genes foi detectada nas análises pareadas dos modelos M1-M2 e M7-M8, porém o par M0-M3 indicou uma alta variabilidade da razão ωentre os sítios; (iv) foi observada a monofilia para a telomerase a que está mais relacionada à transcriptase reversa dos retrotransposons não-LTR; (iv) um novo esquema Phylo foi concebido e incorporado no GUS 3.5 estendendo-o a fim de armazenar os dados obtidos de inferências filogenéticas. As principais conclusões são: (i) O sistema ARPA é uma alternativa viável, eficiente, fácil e de tempo reduzido para as análises filogenômicas. O RAXML foi considerado o programa mais consistente e foi observado que as árvores construídas usando as sequências inteiras e/ou as trimadas com o TRIMAL apresentaram os melhores resultados. A abordagem da supermatriz apresentou melhores resultados do que a superárvore; (ii) as relações entre os grupos de protozoários estão de acordo com estudos anterioresda literatura, os quais determinaram também uma monofilia para os protozoários. A inclusão de mais dados/genes é necessária para obter uma árvore robusta; (iii) foram reconstruídas as árvores dos genes dos EGM e inferida a filogenia para cada um deles. O modelo M3 indicou uma alta variabilidade da razão ωentre os sítios e os modelos M7 e M8 indicaram a presença de seleção positiva para todos os genes dos EGM; (iv) a telomerase formou um grupo monofilético mais relacionado à transcriptase reversa dos retrotransposons não-LTR; (v) o esquema Phylo armazena os dados obtidos de experiências filogenéticas, mantendo as relações de herança filogenética entre cada um dos táxons, o que permite realizar consultas usando as informações dos ramos, dos nós e táxons da árvore.The reconstruction of the evolutionary history, as well as the establishment of the hypotheses that demonstrate the phylogenetic relationships of the genes encoded by Mobile Genetic Elements (MGEs) require the use of various tools and approaches, which are not available in a friendly or integrated interface. Different phylogenetics, phylogenomics and evolutionary approaches are necessary for the inference of the species phylogeny. These same approaches are required on the study of less conserved genes as the reverse transcriptase that is the most representative gene of the class I of the MGEs, the retrotransposons. The main phylogenetic algorithms and programs developed by our group have been unified into a single system - the ARPA - written in the programming language PYTHON. The ARPA system and the web interface are hosted at FIOCRUZ and are available at http://arpa.biowebdb.org. They are currently being integrated to the database system ProtozoaDB (http://protozoadb.biowebdb.org) and to the semi-automatic annotation system Stingray (http://stinngray.biowebdb.org/). An approach based on the fundamentals of evolution andphylogenomics has been applied to achieve five different objectives: (i) to analyze and to infer the phylogeny to the genes related to drug resistance in protozoan genomes, (ii) to reconstruct a protozoan species tree, (iii) to conduct phylogenomic studies of MGEs in Protozoa, (iv) to infer phylogeny from the telomerase and the retrotransposable elements in Tri-Tryps and (v) to adapt and to extend the schema Phylo to the GUS database, for storing phylogenetic informations. The results obtained for topics were: (i) The construction of the phylogenetic trees of the genes, AQP, hsp70, GP63, TRYR and MRPA which are related to drug resistance in protozoan demonstrated the viability of the executions of theARPA system. (ii) The protozoan species tree using the supermatrix approach proved to be reliable. The PTP Test and the Statistical G1 demonstrated that the molecular data of this study have phylogenetic signal. (iii) The PAUP-AV was shown to be the most consistent program and thePHYML was the least to deal with different levels of polymorphisms of these genes. The in silicodetection of the positive selection in MGEs genes in Protozoa was detected in the paired analysis of the models M1-M2 and M7-M8, but the pair M0-M3 indicated a high variability of the ratio ωbetween the sites. (iv) It was found that a monophyly is present for the telomerase, which was the most closely related to the transcriptase of the non-LTR retrotransposons. (v) A new Phylo schema was designed and incorporated into the GUS 3.5 extending its service to store the dataobtained from phylogenetic experiments. As conclusions: (i) The ARPA system is a viable, efficient, easy and reduced time alternative for phylogenomic analysis. The RAXML was considered the most consistent program and was observed that the trees constructed using the entire and/or the trimmed sequences with TRIMAL showed the best results. The supermatrix approach showed better results than the supertree. (ii) The relationships between protozoangroups are in agreement with previous studies, which also determined a monophyly for protozoan. The inclusion of more data/genes is required to obtain a consistent tree. (iii) In the trees of the EGM, the PAUP-AV was the most consistent and the PHYML the least to deal with different levels of polymorphisms of these genes. The model M3 showed a high variability of ωratio among sites and the models M7 and M8 indicated the presence of positive selection forall genes of EGM. (iv) The telomerase formed a monophyletic group more related to the reverse transcriptase of the non-LTR retrotransposons. (v) The scheme Phylo stores the data obtained from phylogenetic experiences, keeping the inheritance of phylogenetic relationships between each of the taxa, which can perform queries using information from the branches, nodes and taxaof the tree

    Analyzing provenance across heterogeneous provenance graphs

    No full text
    Provenance generated by different workflow systems is generally expressed using different formats. This is not an issue when scientists analyze provenance graphs in isolation, or when they use the same workflow system. However, when analyzing heterogeneous provenance graphs from multiple systems poses a challenge. To address this problem we adopt ProvONE as an integration model, and show how different provenance databases can be converted to a global ProvONE schema. Scientists can then query this integrated database, exploring and linking provenance across several different workflows that may represent different implementations of the same experiment. To illustrate the feasibility of our approach, we developed conceptual mappings between the provenance databases of two workflow systems (e-Science Central and SciCumulus). We provide cartridges that implement these mappings and generate an integrated provenance database expressed as Prolog facts. To demonstrate its usage, we have developed Prolog rules that enable scientists to query the integrated database.</p

    Analyzing provenance across heterogeneous provenance graphs

    No full text
    Provenance generated by different workflow systems is generally expressed using different formats. This is not an issue when scientists analyze provenance graphs in isolation, or when they use the same workflow system. However, when analyzing heterogeneous provenance graphs from multiple systems poses a challenge. To address this problem we adopt ProvONE as an integration model, and show how different provenance databases can be converted to a global ProvONE schema. Scientists can then query this integrated database, exploring and linking provenance across several different workflows that may represent different implementations of the same experiment. To illustrate the feasibility of our approach, we developed conceptual mappings between the provenance databases of two workflow systems (e-Science Central and SciCumulus). We provide cartridges that implement these mappings and generate an integrated provenance database expressed as Prolog facts. To demonstrate its usage, we have developed Prolog rules that enable scientists to query the integrated database.</p

    2014 IEEE 28th International Parallel &amp; Distributed Processing Symposium Workshops Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC Clouds

    No full text
    Abstract — Computer-aided drug design techniques are important assets in pharmaceutical industry because of their support for research and development of new drugs. Molecular docking (MD) predicts specific compound’s binding modes within the active site of target proteins. Since MD is a timeconsuming process, existing approaches reduce the number of receptors or ligands in docking by evaluating only small sets of compounds. This restriction in the search space reduces the chances to uniformly cover the diverse space of compounds and misses opportunities to recognize whether new drugs can be identified. Another difficulty with large-scale is analyzing the results, e.g. browsing all directories manually to find which pairs were docked successfully. To address these issues we explored the potential of data provenance analysis and parallel processing of SciCumulus, a cloud Scientific Workflow Management System. We present SciDock, a molecular docking-based virtual screening workflow and evaluate its execution using 10,000 receptor-ligand pairs related to proteases enzymes of protozoan genomes. The overall performance of SciDock using 32 cores, in cloud virtual machines, reaches improvements up to 95.4 % when running SciDock with AutoDock and 96.1 % when running SciDock with Vina. We show how data provenance improved the result analysis and how it may indicate potential proteases drug targets for protozoan treatments. Keywords-component; workflow; cloud; drug discovery I

    Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach

    No full text
    Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data
    corecore